Skip to content

BUG: groupby apply on selected columns yielding scalar (GH13568) #13585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Jul 8, 2016

@jorisvandenbossche jorisvandenbossche added Bug Groupby Regression Functionality that used to work in a prior pandas version labels Jul 8, 2016
@jorisvandenbossche jorisvandenbossche added this to the 0.18.2 milestone Jul 8, 2016
@jorisvandenbossche
Copy link
Member Author

@jreback Can you remember the reason that you added the name=self.name in several places in the groupby/rolling/resample refactor? (see 6994240#diff-720d374f1a709d0075a1f0a02445cd65R3376)
I think the fix is correct here, as AFAIU self.name can only be None (and that is already the default for name in Series) or a list (from self._selection) which raises an error as name.

@jreback
Copy link
Contributor

jreback commented Jul 8, 2016

The nested groupy-aplly (e.g. .rolling.groupby.mean does .rolling.groupby.apply(lambda x: mean), and wasn't passing things thru. However the fixing name things was in several places; it may be that this particular change was not needed.

So the path that you are changing is all scalars, so this seems ok. If nothing breaks then you are good to go!.

though the path in the OP is really related to _selection, so this might be a red herring.

In any event I would add a nice comment about why we are not passing name here.

@codecov-io
Copy link

codecov-io commented Jul 8, 2016

Current coverage is 84.33%

Merging #13585 into master will not change coverage

@@             master     #13585   diff @@
==========================================
  Files           138        138          
  Lines         51100      51100          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits          43097      43097          
  Misses         8003       8003          
  Partials          0          0          

Powered by Codecov. Last updated by 5605f99...c463e1f

@jorisvandenbossche
Copy link
Member Author

So the path that you are changing is all scalars, so this seems ok. If nothing breaks then you are good to go!.

Indeed, that was my thought as well

though the path in the OP is really related to _selection, so this might be a red herring.

The problem is that I am not familiar enough to know all use / corner cases of self._selection and self.name. At first sight setting name equal to _selection in case of a selection of multiple columns (so for a DataFrameGroupBy) does not seem to make much sense to me, as you never want to use that as the name of a result

Added a comment

@jorisvandenbossche
Copy link
Member Author

Actually, if I just let name return None instead of _selection (except for SeriesGroupBy, there the property is overwritten), the tests pass as well: jorisvandenbossche@a6f5c9e

@jreback
Copy link
Contributor

jreback commented Jul 9, 2016

lgtm

@@ -478,6 +478,7 @@ Bug Fixes
- Bug in ``PeriodIndex`` construction returning a ``float64`` index in some circumstances (:issue:`13067`)
- Bug in ``.resample(..)`` with a ``PeriodIndex`` not changing its ``freq`` appropriately when empty (:issue:`13067`)
- Bug in ``.resample(..)`` with a ``PeriodIndex`` not retaining its type or name with an empty ``DataFrame`` appropriately when empty (:issue:`13212`)
- Bug in ``groupby(..).apply(..)`` when the passed function returns scalar values per group (:issue:`13468`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

though need to rebase and move this

@jorisvandenbossche jorisvandenbossche force-pushed the bug-13568-groupby-apply-name branch from a9151d1 to c463e1f Compare July 11, 2016 08:44
@jorisvandenbossche jorisvandenbossche merged commit 2f7fdd0 into pandas-dev:master Jul 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Apply on selected columns of a groupby object - stopped working with 0.18.1
3 participants